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Ab»""a^'- A number of developments have taken place since the formulation of Marr and I'oggio's 
/~\ theory of human stereo vision. In particular, these concern the shape of the underlying receptive 

fields, the control of eye movements and the role of neuronal pools in the so-called pulling effect. 
These and other connected matters are briefly discussed. 
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1. Some Comments on a Recent Theory of Stereopsis 

We recently proposed an algorithm for the matching process in human stereopsis 
(Marr and Poggio, 1977, 1979). A number of points have developed since the theory was 
formulated in 1977 and we feel that they are interesting enough to deserve being 
discussed explicitly. In this paper we raise these points and discuss the present stance of 
the theory with respect to psychophysical, physiological and computational data. 



2. The Shape of the Underlying Receptive Fields 

According to the theory each image is filtered by oriented second derivative (bar) 
masks of four different sizes and the zero-crossings are matched between the two filtered 
images. Under rather weak assumptions, it can be shown that the oriented masks can be 
replaced by circularly symmetric masks (Marr and Hildreth, 1979) and we feel that the 
physiological implementation is based on this scheme. Specifically we believe (Marr and 
Hildreth, 1979; Marr and Ullman, 1979) that the LGN center-surround cells are the 
filters, whereas cortical simple cells are oriented zero-crossing detectors and should not be 
f^, thought of as oriented bar mask filters. Such a scheme has been implemented on the 

MIT Artificial Intelligence Laboratory computer (Grimson and Marr, 1979) and it leads 
to an efficient stereo matching program. Mayhew and Frisby (1978) have independently 
found psychophysical evidence that the underlying filters in stereopsis are not oriented. 

The statistical analysis that we gave for the interval distribution of zero-crossings 
of oriented filters is not valid for center-surround receptive fields. This is important, 
because the results of this analysis enable one to make quantitative predictions about the 
extent of Panum's fusional area under various conditions. 

The reason why one needs to modify the analysis is that the Fourier transforms 
of the oriented and non-oriented filters are different. That of the oriented filter consists 
X roughly of two infinite vertical stripes, symmetrically placed either side of the w axis. 

The cross-section looks like two camel humps (see Marr and Poggio, 1979, Table 1). The 
Fourier transform of the center-surround filter, on the other hand, looks like a ring 
around the origin in the Fourier plane (w^ , w ). In order to computer the interval 
distribution between zero-crossings, measured along horizontal scan lines in the image, one 
has to use the one-dimensionai Fourier spectrum obtained by projecting the filter's 
two-dimensional transform 'onto the h-^ -axis. The projections are clearly different for the 
two filters, and hence the results of the statistical analysis differ. 



f^. 



The results of changing to center-surround filters are as follows. If no restriction 
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is placed on the relative orientations of the zero-crossings that are matched between the 
two images, the results are numerically about the same as for oriented filters expressed in 
the usual units of h-^.^ . This is the width of the central part of the receptive field as 
projected onto a line, and it is what Wilson measured in his experiments. 

If one places restrictions on the relative orientations of the matched 
zero-crossings, the distance between compatible zero-crossings increases dramatically. For 
example, if one requires that their orientations must be within 30° before they can match, 
the probability of finding two candidate zero-crossings within iw^.^ decreases from about 
50% for the two previous cases to about 107o (Grimson, in preparation). Furthermore, 
Eric Grimson has found, in statistical measurements made on the results of running his 
program on 50% random dot stereograms, that the empirical findings match closely the 
theoretical predictions. 



3. The Pools are only for Pulling 

The theory is not definite on the neural implementation of the algorithm or the 
neural representation of disparity. Disparity could be represented either by many neurons 
each sharply tuned to a particular disparity or by two or three pools of more broadly 
(-w) tuned neurons. What the theory does require is the existence of three pools of the 
second kind for the purpose of disambiguation. According to the theory, in up to 50% of 
the cases possible matches could be ambiguous, and this ambiguity is resolved by 
"pulling," that is by consulting the sign (convergent, divergent, near zero) of neighboring 
successful matches. 



4. Some Consequences of Pulling 

The operation of pulling amounts to a kind of local averaging and, therefore, 
tends to reduce the rate at which the system can follow spatial changes in disparity 
(corrugations). Tyler (1978) for example, showed that the bandwidth of stereopsis is 
limited to 3-4 cycles/degree. This may help to determine the neighborhood size over 
which pulling takes place. 

If there are cells that implement the pulling process they might for example have 
the following characteristics: 

r^ 1) They may be sensitive to the disparity of a thin bar, but rather insensitive to 

its position in the receptive field. 
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2) The effect of introducing a second bar within the receptive field should 
depend on whether its disparity has the same sign (+, — , 0) as that of the first bar. 

3) The disparities involved here are small, on the order of the optimal width of 
the bar. These cells are not to be identified with the "near" and "far" neurons described 
physiologically by Poggio and Fisher and other authors (Poggio and Fisher, 1977; von der 
Heydt et al., 1978) and which we discuss in the next section. 



5. Control of Vergence Movements and Rough Sense of Depth 

Our theory concentrated on stereo fusion and we neglected the fact that vergence 
movements may be controlled not only by the matching of zero-crossings, but also by 
more general and weaker estimates of disparity. Thus, our prediction 13, that for 
disparities outside the range of the largest channel, the control of additional vergence 
movements should exhibit the behavior of a random search, is not a necessary 
consequence of the theory. 

There are various simple ways of estimating whether the visible surface lies 
convergent or divergent to the current fixation. These methods, however, would deliver 
only information about the sign of the disparity and not about the shape of the surface. 
One of these w\iys is simply to compare the number of possible convergent and divergent 
matchings. Such methods can be quantified. For example, using standard signal 
detection criteria, if the number of disparity detectors falls off inversely with disparity, 
then the criterion for detecting the existence of a convergent (divergent) surface at 
disparity d varies with the square root of the area of that surface. Such a relation has 
been suggested by B. Julesz and P. Burt, working with dynamic random-dot stereograms 
(personal communication). The important characteristics of such estimators are: 

a) They can provide a rough sense of depth and hence drive eye movements. 

b) They can provide no evidence about the precise shape of the region. 

c) No depth discrimination between two convergent, or two divergent planes will 
be possible. 

These points are essentially contained in our prediction 7 (p. 322, Marr and 
;r^ Poggio, 1979). 
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■.u u .i"^ "^ural substrate of such operations may perhaps be identified tentatively 
with the near and "far" neurons described by several physiologists. These have typical 
disparity ranges of a degree or more (in the macaque) which is much larger than the few 
minutes required for the pools we discussed earlier (see the diplopic detectors of Figure 7 
and text, Marr and Poggio, 1979). ^ 

6. The Trigonometry of Stereopsis 

The recovery of depth and surface orientation from disparity and its rate of 
change across the image is a matter of simple trigonometry. The formulae raise some 
points ot interest, however, and so we give them here. 

Figure 1(a) shows a top view and Figure 1(b) a side view of the geometry of the 
situation. It IS straightforward to check that the rate of change of disparity measured 
vertically across the image is 

(90/ di^^=-A l/[(B + /)2 -{. A^] cot e 
w^here 6 is the vertical component of surface orientation. 
The horizontal situation is given by 

50/ di,,=[A^ + B{B + l)-Alcot(S\/[A^ + {B + ff] 

Notice that this expression reduces to 1 when the surface coincides with the line of sight 
from the right eye. ^ 

The interesting feature of thes& formulae is that the right hand side essentially 
has a factor of 1//, A and B being small. In other words, perceived surface orientation 
tor a fixed rate of change of disparity across the image should depend upon the current 
estimate of distance from the viewer to the fixation point. This observation is reflected 
in the perceptual fact that as one increases the viewing distance to a random-dot 
stereogram, the perceived surface orientation steepens. 

7. Remarks about the Domain of the Matching Process 

^ According to our theory the items that are matched between the two images are 

zero-crossings and terminations obtained from Wilson's four channels. This theory has 
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Figure 1. (a) A topview of the geometry of the two eyes looking at a point P a distance / 
from the left eye. The line of sight is not necessarily perpendicular to the line joining the 
two eyes L and R, and the difference is described by the angle O) as illustrated. The true 
inter-ocular distance for this line of sight is 6,, and the effective inter-ocular distance for this 
line of sight is 6^ cos U). The angle between the lines of sight from the two eyes is \^, and it is 
the differences in the values of ^ for different points P that are normally called disparities. 
The lengths A = 5^ cos w, and B= 5^ sin U), are useful geometrical quantities. 




(b) A side view of the same situation. The point P is shown lying on a plane that slopes 
vertically, and its slope at P is described by the angle 6. Only the left eye L is shown in this 
diagram, and again the distance /refers to the distance from the left eye. In order to recover 
surface orientation, one needs to recover the angle d. 
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now been nnplemented and actually works on natural images and on random-dot 
stereograms (Grimson and Marr, 1979). It is, however, not necessary that matching take 
place at this early stage. It could be that one first finds edges and groups and then 
matches these, which would amount to matching two primal sketches (Marr, 1976) rather 
than two sets of zero-crossing segments. 

An argument against this is one of precision. When zero-crossings are first 
extracted their localization may be extremely precise and. hence, measurements of stereo 
disparity can easily be made very accurately. The longer one waits and the more 
complicated are the grouping operations that one carries out before matching, the more 
difficult It would be to maintain precision. While this is not an important consideration 
or rough estimates of disparity, it is probably crucial for the fusion process itself. We 
therefore feel that although the matching of higher order primitives is very likely to be 
used for guiding eye movements and obtaining rough sensations of depth, it would be 
surprising If the fusional process itself involved primitives substantially more complex than 
zero-crossings and terminations. 
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